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Abstract 

In order to make graphical Gaussian models a viable modelling tool when the 
number of variables outgrows the number of observations, H0jsgaard and Lauritzen 
(2008) introduced model classes which place equality restrictions on concentrations 
or partial correlations. The models can be represented by vertex and edge coloured 
graphs. The need for model selection methods makes it imperative to understand 
the structure of model classes. We identify four model classes that form complete 
lattices of models with respect to model inclusion, which qualifies them for an 
Edwards-Havranek model selection procedure (Edwards and Havranek, 1987). Two 
classes turn out most suitable for a corresponding model search. We obtain an 
explicit search algorithm for one of them and provide a model search example for the other. 

Keywords: Conditional independence; Covariance selection; Invariance; Model selection; 
Patterned covariance matrices; Permutation symmetry 

1 Introduction 

Graphical models are probabilistic models which use graphs to represent dependencies between 
random variables. This article is concerned with models represented by undirected graphs, in 
which each variable corresponds to a vertex and a pair of vertices is joined by an edge unless 
the corresponding variables are conditionally independent given the remaining variables. In 
addition to providing a concise form of visualisation of the conditional independence struc- 
ture of a model, the graphical representation can be exploited to make statistical inference 
computations more efficient (Lauritzen, 1996). 

Motivated by the growing need for parsimonious models in modern day applications, in 
particular when the number of variables outgrows the number of observations, in recent years 
graphical models with additional equality constraints on model parameters are becoming of 
increasing interest, in discrete models (Gottard et ah, 2010; Ramirez- Aldana, 2010) as well as 
in multivariate Gaussian models, which are the central object of interest in this article. First 
studies (Hojsgaard and Lauritzen, 2008; Uhler, 2010) show that equality constraints reduce the 
minimal number of observations required to ensure estimability of the model parameters with 
probability one, which makes graphical Gaussian models with equality constraints a promising 
model class. 
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Symmetry constraints, induced by distribution invariance under a permutation group ap- 
plied to the variable labels, are a special instance of equality constraints and have a long history 
for the Gaussian distribution before the advent of graphical models (Wilks, 1946; Votaw, 
1948; Olkin and Press, 1969; Olkin, 1972; Andersson, 1975; Jensen, 1988). First studies of 
models combining symmetry constraints with conditional independence relations are given in 
Hylleberg et al. (1993); Andersen et al. (1995); Madsen (2000). The models we study have 
been introduced in Hojsgaard and Lauritzen (2008) and contain the models in Hylleberg et al. 
(1993) as a special case. The types of restrictions are: equality between specified elements of the 
concentration matrix (RCON) and equality between specified partial correlations (RCOR). The 
models can be represented by vertex and edge coloured graphs, where parameters associated 
with equally coloured vertices or edges are restricted to being identical. 

In order for RCON and RCOR models to become widely applicable in practice model 
selection methods need to be developed, which motivates the study of model structures. This 
is the main objective of this article. As both model types RCON and RCOR models form 
complete lattices, both qualify for the Edwards-Havranek model selection procedure for lattices 
(Edwards and Havranek, 1987). However due to the large number of models it is more feasible 
to search, at least initially, in suitable subsets of the model space. Particularly favourable are 
subsets of models in which equality constraints are placed in a pattern which makes them more 
readily interpretable. 

Four model classes with desirable statistical properties which express themselves in regu- 
larity of graph colouring have been previously identified in the literature: The most restrictive 
is given by graphical symmetry models studied in Hylleberg et al. (1993), also appearing in 
H0jsgaard and Lauritzen (2008) under the name RCOP models. The corresponding graph 
colourings are given by vertex and edge orbits of a permutation group acting on the variable 
labels and we therefore term them permutation- generated. Colourings representing models 
which place the same equality constraints on the concentrations and partial correlations were 
termed edge regular in Gehrmann and Lauritzen (2011). Two further model types, ensuring 
estimability of a non-zero model mean subject to equality constraints, were introduced in 
Gehrmann and Lauritzen (2011), the colourings representing them termed vertex regular and 
regular respectively. 

The main results presented in this article are that each of the model classes forms a 
complete lattice of models and the identification of their meet and join operations. The 
former is established by showing that each model class is stable under model intersection, 
which gives the shared meet operation, and by demonstrating that whenever a model does 
not fall inside a given class there is a unique smallest larger model, or supremum, which does, 
giving the distinct join operations. The found lattice structure qualifies each model class for 
an Edwards-Havranek model search, giving a first model selection procedure for RCON and 
RCOR models. 

We focus on models represented by edge regular and permutation-generated colourings as 
their structure is generally more tractable and their constraints more readily interpretable. 
We present an Edwards-Havranek model selection algorithm for models with edge regular 
colourings and illustrate it by means of an example with five variables, with a very encour- 
aging performance. We further provide an example of a model search within models with 
permutation-generated colourings with four variables, commonly known as Fret's heads (Frets, 
1921; Mardia et al, 1979). 
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2 Preliminaries and Notation 



2.1 Notation 

Let G = (V, E) be an undirected uncoloured graph with vertex set V and edge set E. For a 
|V| x |V| matrix A = (a a p), A{G) shall denote the matrix defined by A(G) a p = whenever 
there is no edge between a and (3 in G for a ^ (3, and ^(G)^ = a a p otherwise. For a set of 
matrices M we let M + denote the set of positive-definite matrices inside M. S shall denote the 
set of symmetric matrices, so that S + denotes the set of symmetric positive definite matrices 
and S + {G) the set of symmetric positive definite matrices indexed by V whose a/3-entry is zero 
for af3 E for a ^ (5. We indicate that a matrix is symmetric by only writing its elements on 
the diagonal and above. Asterisks as matrix entries indicate that the corresponding entry is 
unconstrained apart from any restrictions stated explicitly. 

For a discrete set D we let S{D) denote the set symmetric group acting on D, consisting 
of all permutations of the elements in D. For F C S(D), (F) denotes the group generated by 
F, containing all permutation which can be expressed as products of elements in F and their 
inverses. Permutations are written in cycle notation, meaning that a = . . . i r ) maps ij to 
for 1 < j < r and i r to i\. 

For a graph G = (V,E), Aut(G) denotes the automorphism group of G, containing all 
permutations in S(V) which leave G invariant. For a partition P of a set S and a,b £ S we 
write a = b (P) to denote that a and b lie in the same set in P. For n € N, we denote sets of 
the form {1, 2, . . . , n} by [n]. 



2.2 Graphical Gaussian Models 

A graphical Gaussian model is concerned with the distribution of a multivariate random vector 
Y = (Y a ) a( zv Let G = (V,E) be an undirected graph with vertex set V and edge set E. 
Then the graphical Gaussian model represented by G is given by assuming that Y follows a 
multivariate Gaussian A/jyi(jU, S) distribution with concentration matrix K = X" 1 € S + {G). 

The entries in K = {k a p) a ^ & y have a simple interpretation. The diagonal elements k aa 
are reciprocals of the conditional variances given the remaining variables 

k aa = Var(y a |ly\{ Q }) -1 

for a € V. The scaled elements of the concentration matrix 

cap = , (1) 

for a, (3 E V are the negative partial correlation coefficients 

_ C<w(Ya,Yp\Yy\{ a ,p}) _ _ 

Pawu*,?} - var(y Q |y nw )V2 Va r(^|y nOT )V2 - c ^ W 

for a,{3EV. It follows that 

a ^E <=^ k a p = o <=> y a jLy^|y n{a ^ } (3) 

see e.g. Chapter 5 in Lauritzen (1996) for further details. 
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2.3 Graph Colouring 

For general graph terminology we refer to Bollobas (1998). Following Hojsgaard and Lauritzen 
(2008), for G = (V,E) an undirected graph, a vertex colouring of G is a partition V = 
{V\, . . . , Vk} of V, where we refer to V\, . . . , Vy. as the vertex colour classes. Similarly, an edge 
colouring of G is a partition £ = {Ei, . . . ,E{\ of E into I edge colour classes E\, . . . , E[. A 
colour class with a single element is atomic and a colour class which is not atomic is composite. 
We let Q = (V, £) denote the coloured graph with vertex colouring V and edge colouring £ and 
let (V, £) denote its graph colouring. For V and £ as above and u G V, we let T u denote the 
\V\ x |V| diagonal matrix with T£ a = 1 if and only if a G u and zero otherwise. Similarly for 
u G £ , we let T u be the symmetric |V| x \V\ matrix with T^n = 1 if and only if a/3 G u and 
zero otherwise. 

In our display of vertex and edge coloured graphs, we indicate the colour class of a vertex 
by the number of asterisks we place next to it. Similarly we indicate the colour class of an 
edge by dashes. Vertices and edges which are displayed in black correspond to atomic colour 
classes. 

2.4 Lattices 

A binary relation p on a set A is a subset of A x A with two elements a,b G A being in 
relation with respect to p if and only if (a, b) G p, which we denote by a p b. If p is reflexive 
[a p a Va G A], antisymmetric [a p b and b p a => a = b Va, 6 G A] and transitive [a p b and 
6 p c =>■ ape Va, 6, c G A], it is a partial ordering relation and ^4 a partially ordered set or 
poset. We denote a poset ^4 with partial ordering relation p by {A;p), abbreviated by simply 
A if the binary relation is clear. 

For H C A and o G A, a is an upper bound oi H \f h < a for all /i G H. a is the Zeasi 
upper bound or supremum of iif if every upper bound 6 of H satisfies a <b and we then write 
a = supH. Lower bound and greatest lower bound or infimum, denoted m£H, are defined 
similarly. sup0 is the smallest element in A, called zero, if it exists, and inf is the largest 
element in A, called unit, if it exists. 

A poset (L; <) is a lattice if inf H and supH exist for any finite nonempty subset H of L. 
It is called complete if inf H and sup H also exist for H = 0. A poset can be shown to be a 
complete lattice with the following result. 

Lemma 2.1. If (P; <) is a posei in which inf exists for all H C P ; t/ien (P; <) is a complete 
lattice. 

For a lattice L and a,b £ L, we write a A 6 for inf {a, 6} and a V b for sup{a, 6}, and refer to 
A as the meet operation and to V as the join operation. L is distributive if for all a, b, c G L, 

a V (6 A c) = (a V 6) A (a V c) (4) 

The structure of a lattice (L; <) may be visualised by a Hasse diagram, in which each 
element pair a, b G L is joined by an edge whenever a < b and there is no x G L \ {a, b} such 
that a < x <b. 

We denote most partial orderings by <. Which partial ordering the symbol refers to will 
be determined by the context. For an overview of lattice theory see Gratzer (1998). 
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3 Model Types: RCON and RCOR Models 



3.1 RCON Models: Equality Restrictions on Concentrations 

RCON models are graphical Gaussian models which place equality constraints on the entries 
of the concentration matrix K = X -1 . For a model whose conditional independence structure 
is represented by graph G = (V,E), the restrictions can be represented by a graph colouring 
(V,S), with the vertex colouring V representing constraints on the entries on the diagonal of 
K and the edge colouring £ representing constraints in the off-diagonal entries. Whenever 
two vertices a, (3 £ V belong to the same vertex colour class, the corresponding two diagonal 
entries k aa and kpp are restricted to being identical. Similarly, two edges af3,jd € E of the 
same colour represent the constraint k a /3 = k^s- 

We denote the set of positive definite matrices which satisfy such constraints for a graph 
colouring (V, £) by S + (V, £). Put formally, the distribution of a random vector Y € MX is said 
to lie in the RCON model represented by the coloured graph Q = (V, £) if 

Y~A/V(0,£), K = Z~ 1 eS+(V,£) = | A « T "' A€M VU 4 (5) 

Uevuf J 

Since the constraints are linear in K, by standard exponential family theory (Brown, 
1986), just as unconstrained graphical Gaussian models, RCON models are regular exponential 
families. Thus the maximum likelihood estimate of A is uniquely determined, provided it 
exists. For the corresponding computation algorithm see Hojsgaard and Lauritzen (2008) and 
H0jsgaard and Lauritzen (2007). Note that RCON models are instances of models considered 
in Anderson (1970). 

Example 3.1. The data consist of the examination marks of 88 students in the mathematical 
subjects Algebra, Analysis, Mechanics, Statistics and Vectors (Mardia et al., 1979). Whittaker 
(1990); Edwards (2000) previously demonstrated an excellent fit to the unconstrained model 
represented by the graph shown in Figure 1(a). H0jsgaard and Lauritzen (2008) show the 
data to also support the RCON model represented by the graph in Figure 1(b). The model 
specifications are 

y~A/" 5 (0,£), S 4 eM 

with M given below the corresponding graphs. If the subjects are indexed in alphabetical 
order, the graph colouring representing the constraints of the RCON model is given by (V,£) with 
V = {{1}, {2, 5}, {3, 4}} and £ = {{12}, {13, 14, 15, 24, 35}}. Note that the number of model 
parameters has been reduced from 11 to 5. 
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(a) Inferred conditional independence struc- 
ture of Mathematics marks data. 



(b) RCON model supported by Mathematics 
marks data. 



Figure 1: Mathematics marks example. 



3.2 RCOR Models: Equality Restrictions on Partial Correlations 

RCOR models place symmetry restrictions on the diagonal elements of the concentration 
matrix K = S" 1 and on the partial correlations as given in equation (2). Just as for RCON 
models, for a model with graph G = (V,E), the constraints can be represented by a graph 
colouring (V, £): Vertices of the same colour represent restrictions on the diagonal entries of 
K (exactly as in RCON models), and whenever two edges af3,^5 6 E belong to the same edge 
colour class in £ , the corresponding partial correlations p a p\v\{a,/3} an d PyS\V\{y,S}^ defined in 
equation (2), are restricted to being identical. 

We denote the set of positive definite matrices which satisfy such restrictions for a graph 
colouring (V,£) by 1Z + (V, £). If we let the |V| x |V| matrices A = (a a p) and C = {c a p) be 
given by a a p = sjk a ^ for a = f3 and zero otherwise, and let c Q( g as in equation (1) for a ^ (3 
and c a p = 1 otherwise, then K = AC A and the distribution of a random vector Y G MY lies 
in the RCOR model represented by the coloured graph Q = (V, £) if 



Y ~A/V(0,£), K = YT X eTZ + {V,£) = I AC A \A=^i lu T u , r/ € 

C = I + Y,t u T u , re (-1,1)4(6) 

ut£ J 

Thus the constraints of RCOR models define a differentiable manifold in S + which makes 
them curved exponential families (Brown, 1986). Therefore, the maximum likelihood estimates 
of 77 and r, if they exist, may not be unique. For a discussion and computation algorithm we 
refer to Hojsgaard and Lauritzen (2008) and H0jsgaard and Lauritzen (2007). 

RCOR models are scale invariant if variables inside the same vertex colour class are 
manipulated in the same way (Hojsgaard and Lauritzen, 2008). Thus they are particularly 
suitable for variables measured on different scales. 
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We highlight that both RCON and RCOR models generally do not place the same equality 
restrictions on S as they do on S _1 and on partial correlations. 

Example 3.2. The data is concerned with anxiety and anger in a trait and state version of 684 
students (Cox and Wermuth, 1993) and strongly support the conditional independence model 
displayed in Figure 2(a). As shown in Hojsgaard and Lauritzen (2008), they also support the 
RCOR model represented by the coloured graph in Figure 2(b), parametrised by 6 parameters 
rather than 8. The variable names are combinations between T or S, for "trait" and "state", 
and X or N, standing for "anxiety" and "anger". The model specifications are 

Y~A/" 4 (0,£), r ! eM 

with M given below the graphs. The variables are indexed anti-clockwise starting from TX. 
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(a) Conditional independence structure sup- (b) RCOR model supported by personality 

ported by personality characteristics data. characteristics data. 

Figure 2: Personality characteristics example. 



3.3 Number of RCON and RCOR Models 

Let Sy and TZ V denote the sets of RCON and RCOR models with variable set V and let Cy 
be the set of vertex and edge coloured graphs with vertex set V. Further, let My be the set 
of unconstrained graphical Gaussian models with variable set V and Uy the set of undirected 
graphs with vertex set V. 

As, by equations (5) and (6), for both model types there is one model parameter for each 
vertex colour class in V and one for each edge colour class in E in the coloured dependence 
graph Q = (V,f), there are as many RCON and RCOR models with variables V as there are 
coloured graphs with with vertex set V, i.e., \Sy\ = \TZy\ = \Cy\. Given that the number of 
graph colourings of a particular graph G = (V,E) is given by the product |P(V)||P(£/)| of the 
number of partitions of V multiplied by the number of partitions of E, we obtain 

\Cy\= Y, \P{V)\\P(E)\ = \P(V)\ \ P W\ ( ? ) 

G={V,E)eU v G=(V,E)eU v 

For a discrete set S of size d, \P(S)\ is given the d th Bell number (Bell, 1934; Pitman, 
1997), which satisfy the recursive relationship Bd+i = Ylt=o (fe)-^ fe > with -Bo = 1- Hence, 

('2') IV 

|S+| = r74l=%| E B \E\=B\v\jz( { \ 

G=(V,E)€U V k=0 V 



Bi 
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For each d, can be evaluated as the least integer greater than the sum of the first 2d terms 
in Dobinski's formula (Dobihski, 1877; Comlet, 1974) 

k d , /0 d l d 2 d \ 
^~kl = (of + IT + 2!" + " 7 < 8 > 

fc=0 v 7 

so that clearly \Sy\ = \TZ V \ grow super-exponentially in |V|. For illustration, observe that 
while |M [4] | = 64 and |M [5] | = 1, 024, |«S+ | = [R+ \ = 13, 155 and \St \ = [Rf 5] \ = 35, 285, 640. 

3.4 Structure of the Sets of RCON and RCOR Models 

It is a well-known fact that My is a complete distributive lattice with respect to model 
inclusion, with partial ordering induced by the partial ordering on Uy given by edge set 
inclusion: for Gi = (V,Ei),G 2 = (V,E 2 ) G U v , Gi < G 2 whenever E 1 C E 2 , with G\ A G 2 = 
(V, Ex n E 2 ) and Gi V G 2 = (V, Ei U E 2 ). For M\ , M 2 G M v represented by Gi, G 2 as above, 
Mi C M 2 if and only if G\ < G 2 . The zero in (Uy; <) is the empty graph, and the unit the 
complete graph, in which every edge is present. 

RCON and RCOR models are specified by partitions of V and E. For any finite discrete 
set S, the set P(S) of partitions of S forms a complete non-distributive lattice, with Pi < P 2 
for P±,P 2 G P(S) whenever Pi is finer than P 2 , or, put differently, whenever P 2 is coarser 
than Pi, i.e., if every set in P 2 can be expressed as a union of sets in Pi. This allows the 
identification of a partial ordering ^ on Cy which corresponds to model inclusion in Sy and 
IZy-. For Q = {Vg,£g),7i = {Vhi^h) £ Cv with underlying uncoloured graphs G and H, 
Q whenever 

(i) G < H; (ii) Vg > V^; (iii) every colour class in Eg is a union of colour classes in 

Put in words, if we let Mg, M-u denote two RCON or RCOR models (both of the same 
type) represented by Q,li G Cy, then M.g C Aiu if H can be obtained from Q by splitting of 
colour classes and adding new edge colour classes, or equivalently if Q can be obtained from T~L 
by merging colour classes and dropping edge colour classes. 

For example, for the graphs Gi = (Vi,£i) for i = 1,2, 3 in Figure 3, Q\ < Q 2 as conditions 
(i)-(iii) above are satisfied whereas Q\ Q3 because (ii) and (iii) are violated. Thus the 
corresponding RCON or RCOR models Mi, M 2 ,Ms (all of the same type) satisfy Mi C M 2 
and Mi % Mz- 




Figure 3: Partial ordering in Cm. 

It is then straight forward to show that (Cy, -<) is a complete lattice with meet and join 
operations 

Q An = (V g yV H ,£gy£n) and GVH = (VgAV H ,£gASH) ( 9 ) 
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where £g C £g and £^ C are maximal with the property that they are partitions of the 
same set of edges inside Eg n E n , £ g * = £ g U {{E n \ Eg}} and £% = £ n U {{Eg \ E H }}. The 
graphs in Figure 4 illustrate the operations. The zero in (Cy; X) is given by the empty graph 
in which all vertices are of the same colour and the unit is the complete graph with atomic 
colour classes. 




Proposition 3.3. Let Q = (Vg,£g),H = (V n ,£n) € C v and let S+(Vg, £g), S+(V n , &n) € 5+ 
and lZ + (Vg,£g),lZ + (Vu,£u) € 1Z V be the RCON and RCOR models represented by Q and %. 
Then 

S + (Vg,£g)cs + (V n ,£n) <=> G <U n + {V g ,£g) c K+(V h ,£<h) 

and Sy and IZy are complete non- distributive lattices with meet and join operations induced 
by the meet and join operations in (Cy given in equation (9). 

4 Model Classes within RCON and RCOR Models 

The motivation to study model classes strictly within the sets of RCON and RCOR models 
is three-fold: firstly, having demonstrated that the number of RCON and RCOR models 
grows dramatically with the number of variables, especially for model selection, smaller model 
(search) spaces are desirable. Secondly, generic equality constraints of RCON and RCOR 
models are generally not readily interpretable and, lastly, do not guarantee the corresponding 
model to have any particular statistical properties. 

Four model classes within the sets of RCON and RCOR models which are characterised 
by desirable statistical properties expressing themselves in regularity of colouring have 
been previously identified in the literature. This section is devoted to their definition and 
first properties. Three of the four colouring regularities were termed edge regularity, with 
the corresponding models appearing in Hojsgaard and Lauritzen (2008), vertex regularity 
and regularity in Gehrmann and Lauritzen (2011). We term colourings of the fourth type 
permutation- generated. The corresponding models are referred to as graphical symmetry 
models in Hylleberg et al. (1993) and as RCOP models in H0jsgaard and Lauritzen (2008). 

4.1 Models Represented by Edge Regular Colourings 

RCON and RCOR models place restrictions on different parameter sets, which translates into 
different model properties. While the restrictions in RCON models ensure the models to 
be regular exponential families, RCOR models are scale invariant within vertex colour classes. 
Thus if a graph colouring (V, £) yields the same model restrictions representing the constraints 
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of an RCON model as it does representing those of an RCOR model, it represents a model with 
both of the above desirable properties. Such models can be identified by their graph colouring. 



Definition 4.1. For Q = (V, £) G Cy we say that (V, £) is edge regular if any pair of edges in 
the same edge colour class in £ connects the same vertex colour classes in V. 

It then holds: 

Proposition 4.2 (Hojsgaard and Lauritzen (2008)). The RCON and RCOR models deter- 
mined by (V, £) yield identical restrictions 

S+(V,£) =K + {V,£) 

if and only if (V, £) is edge regular. 

We provide a simple example for illustration. While the colouring in Figure 5(a) is edge 
regular (both green edges (single dash) connect a blue vertex (single asterisk) to a red one (two 
asterisks), and the same is true for the purple edges (two dashes)), the colouring in Figure 5(b) 
is not, as the green edges appear between different pairs of vertex colours. 




1 



-• 3 



H 4 2 



(a) Edge regular colouring. 



(b) A colouring which is not edge regular. 



Figure 5: An edge regular colouring and one which is not edge regular. 



4.2 Models Represented by Vertex Regular Colourings 

Vertex regular colourings are of relevance to the estimation of a non-zero mean vector /i in a 
N\y\(/J,, S) distribution if \i is subject to equality constraints and X -1 is restricted to lie inside 
S + (V,£) or inside 1Z + (V, £) for some coloured graph Q = (V,£). 

Proposition 4.3 (Gehrmann and Lauritzen (2011)). Let Q = (V, £) G Cy and let A4 be a 

partition of V . For a let v a denote the set in A4 which contains a and let 

= £l(M) = {(x a )a£V € M. v : x a = X/3 whenever a = j3 (M)} 

Further let (Y l )i<i< n be a sample of independent identically distributed observations Y % ~ 
M\y\{n,, S) with [i restricted to lie inside 0. Then the following are equivalent 

(i) the likelihood function based on {y l )i<i< n is maximised in \x by the least-squares estimator 
fi* for all S with XT 1 G S+(V,£) or with XT 1 G K+(V,£) where 

\v a \n 

(ii) A4 is finer than V and (A4,£) is vertex regular. 
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For the definition of a vertex regular colouring we require the concept of an equitable 
partition, first defined in Sachs (1966). For an undirected graph G = (V, E), a vertex colouring 
V of V is called equitable with respect to G if for all v , w G V and all a, /3 G v, we have 
|ne(a) D w\ = |ne(/3) PI w\. Vertex regular graph colourings are the analogue to equitable 
partitions for vertex and edge coloured graphs. 

Definition 4.4. For Q = (V, £) G Cy let the subgraph induced by the edge colour class u G £ 
be denoted by G u = (V,u). We say that (V, £) is vertex regular if V is equitable with respect 
to G u for all u G £. 

While the colouring in Figure 6(a) is vertex regular, the colouring in Figure 6(b) is not. 
The former has only one edge colour class, so that it is vertex regular if and only if its vertex 
colouring is equitable with respect to G, which it is. The colouring on the right cannot be 
vertex regular as while vertex 4 is incident to a purple edge (two dashes), vertex 2 isn't even 
though they are of the same colour. 



1 



I 



1 



H 4 2 



(a) Vertex regular colouring. 



(b) A colouring which is not vertex regular. 



Figure 6: A vertex regular colouring and one which is not vertex regular. 



4.3 Models Represented by Regular Colourings 

RCON or RCOR models with restrictions represented by colourings which are both edge regular 
and vertex regular combine the properties of both model classes. It can be shown that the 
colourings of such models are precisely those which in the terminology of Siemons (1983) are 
regular: 

Definition 4.5 (Siemons (1983)). For Q = (V,£) G C v , (V,£) is regular if 

(i) every pair of equally coloured edges in £ connects the same vertex colour classes in V; 

(ii) every pair of equally coloured vertices in V has the same degree in every edge colour class 
in £. 

By the above, the colourings shown in Figure 5(b) and Figure 6(b) cannot be regular. 
While the colouring given in Figure 6(a) is regular, the colouring in Figure 5(a) is not. 

4.4 Models Represented by Permutation-Generated Colourings 

Permutation- generated colourings are a special instance of regular colourings (for a proof see 
Gehrmann and Lauritzen (2011)), and thus by definition also of edge regular and vertex regular 
colourings. They represent models in which equality constraints on the parameters are induced 
by permutation symmetry and allow a particularly simple maximisation of the likelihood 
function. In brief, maximum likelihood estimates can be obtained by standard methods 
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for unconstrained models after taking averages within colour classes Hojsgaard and Lauritzen 
(2008). Further, models represented by permutation-generated colourings form the only model 
class discussed here which restricts X -1 and X in the same fashion. 

The corresponding models are defined through distribution invariance under a permutation 
group T acting on the variable labels V. If Y ~ A/j v| (0, X), then permutations acting on V 
simultaneously permute rows and columns of X so that the distribution of Y is invariant under 
r C S(V) if and only if 

P^X" 1 = Z^Pia) (11) 

for all a € T, where for a, (3 6 V, P(cr) a p = 1 if and only if a maps (3 to a and zero otherwise. 
A necessary condition for equation (11) to hold is that the zero entries in X -1 are preserved 
for all X in the model and all a € T. Thus if the distribution of Y is assumed to lie in the 
graphical Gaussian model represented by graph G = (V,E), by equation (3), we in particular 
require that T C Aut(G). 

Therefore, in the notation in Hojsgaard and Lauritzen (2008), a graphical Gaussian model 
with conditional independence structure represented by graph G which is permutation invariant 
under group V C Aut(G) is given by assuming 

5T 1 € S + (G)nS + {T) 

where <S + (r) is the set of positive definite matrices X satisfying the equivalent conditions in 
equation (11). 

By definition, permutation invariant models place constraints on all model parameters and 
thus in particular on concentrations and partial correlations, which they restrict in the same 
fashion. Thus symmetry constraints in permutation invariant models can be represented by a 
vertex and edge colouring (V, £) of G given by the orbits of T in V and E respectively, i.e., by 
giving two vertices a, (3 € V the same colour whenever there exists a £ T mapping a to (3, 
and similarly for the edges. We term such colourings permutation- generated, formally defined 
below. 

Definition 4.6. For Q = (V, £) £ Cy with underlying uncoloured graph G = (V, E) we say 
that (V, £ ) is permutation- generated if there exists a group V C Aut(G) acting on V such that 
V and £ are given by the orbits of T in V and E respectively. 

The following example illustrates that in addition to the aforementioned desirable statistical 
properties, models with permutation-generated restrictions allow a very intuitive interpreta- 
tion. 

Example 4.7. The data, commonly referred to as Fret's heads, is concerned with the head 
dimensions of 25 pairs of first and second sons (Frets, 1921; Mardia et al., 1979). Previous 
analyses (Whittaker, 1990) support a model represented by the graph in Figure 7(a), where Lj 
and Bi denote the head length and head breadth of son i for i = 1,2. Hojsgaard and Lauritzen 
(2008) showed the model generated by T = {{B\B2){LiL2)) , corresponding to permuting the 
two sons, represented by the first graph in Figure 7(b) to be an excellent fit. 

Another model with constraints generated by permutation symmetry which fits the data 
very well is the complete symmetry model, generated by T = S(V), which is represented 
by the second coloured graph in Figure 7(b). Interestingly, it is further favourable over the 
former with regards to parameter estimation. While the graph on the left in Figure 7(b) is 



P(a)EP( 



P(a)S = EP(a) 
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non-decomposable and symmetry arguments combined with results in Buhl (1993) give that 
at least 2 observations are required for almost sure existence of S, see also Uhler (2010), the 
complete symmetry model only requires one observation for £ to exist almost surely. 



Ll 



Bi 4 



B 2 



(a) Conditional independence struc- 
ture supported by Frets' heads data. 



Li 



*Bo 




(b) Permutation-generated colourings representing 
symmetry constraints supported by Fret's heads 
data. 



Figure 7: Frets' heads example. 



4.5 Relations Between Model Classes 

Let B, P, R and II denote the sets of edge regular, vertex regular, regular and 
permutation-generated colourings respectively. The structural relations between colouring 
classes are summarised in the diagram displayed in Figure 8. In fact, we already saw 
examples of colourings in three of the four disjoint sets in the diagram. The colouring 
displayed in Figure 5(b) lies in P \ R, Figure 7(b) shows a colouring in II and the colouring 
in Figure 6(b) lies in B \ R. (A graph colouring in i? \ II is given by (V,£) with V = [11], 
V = {{1,2, 3}, {4, 5, 6, 7, 8, 9}, {10, 11}} and £ = {{14, 15, 26, 27, 38, 39}, {(4, 10), (5, 10), 
(6, 10), (7, 11), (8, 11), (9, 11)}} where denotes an edge between vertices i and j.) 




Figure 8: Structural relations between colouring classes. 

By Proposition 4.2, a graph colouring yields the same model restrictions representing an 
RCON model as it does representing an RCOR model if and only if it lies in B. Therefore, the 
model type only needs to be specified whenever a graph colouring lies in P\B. Put formally: 

Proposition 4.8. For Q = (V, £) G C v , S+(V, £) = TZ + {V, S) for (V, £) G B and 5+(V, £) + 
TZ+(V, £) for (V, S)eP\B. Thus if for X G {B, P, R, 11} we let denote the set of RCON 
models represented by graphs in X and similarly for IZ^ and RCOR models, then 

Sg = TZ B , Sft = TZ'fi, iSpj = TZ n , S p 7^ Vfp 

giving rise to five model classes lying strictly within Sy and TZy with 

5 n c Sr c n Sp = mip = s R . 
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Let By, Py, Ry and Yly denote the sets of graph colourings inside B, P, R and II with 
vertex set V . By Proposition 4.8, there are five corresponding model classes: S^ v , Sp~ v , 7tJ> v , 
Sft and S$ . For illustration we give the corresponding model class sizes for V = [4] together 
with \Mw \ in Table 1. 



Model class 


^[4] '^[4] 


^W^M 


R[i) 


n [4] 


Af [4] 


Size 


13,155 


3065 1380 


251 


251 


64 



Table 1: Model set sizes for V = [4] . 



The relative sizes in Table 1 are representative of the general case: S% v is the largest model 
class, followed by Sp~ v , TZp v and Si. \Sp v \ = l^p^l wm generally be considerably smaller 
than Sg v as the defining conditions of Py are far more restrictive than those for By. |<S^ | is 
the smallest class of the four for all V, however may equal \<Sr v \ for some V, as for example 
for V = [4]. 

5 Structures of Model Classes 

Below we show that each model class defined above forms a complete non-distributive lattice, 
starting with <Sg and Sjt as their structure turns out most tractable. For brevity, we only 
outline results for the remaining model classes. 

5.1 Models Represented by Edge Regular Colourings 

Proposition 5.1. By is stable under the meet operation A in {Cy; :<) given in equation (9). 

Proof: Let Q = (Vg,£g),% = (Vu,£h) S By. Q A % = (Vg A H,£g/\n) is obtained from Q 
and T~L by dropping of edge colour classes and merging of colour classes. The only operation 
potentially leading to Q A % lying outside of By is merging of edge colour classes. 

So let a(5 and j5 be two edges in Q A T~L of equal colour. Then there exists a sequence 
aoA), • • • , ctkPk in Eg hn such that a /3o = a/3, a fc /3 fc = j8, and ai-\Pi-\ = a^i (Eg) or 
aj_i/3j_i = aiPi {£%) for 1 < i < k. As both Q and H have edge regular colourings, aj_i/3j_i 
and OLifii connect the same vertex colour classes in the graph in which they are of equal 
colour, which we denote by {aj_i,/3j_i} = {ai,/3i} (Vx) with X G {G,H}- This gives that 
{a, /3} = {a , A)} = {ak,Pk} = {t> (Vg v ^h)- As Vg W w = VgjAH, «/3 and 7<5 connect the 
same vertex colour classes in Q A rl. □ 

Graphs £?4, <7s and £4 A O5 in Figure 4 illustrate the stability of By under A. That By is 
generally not stable under the join operation V in equation (9) is established by the example 
in Figure 9. 

Proposition 5.2. Let Q = (V, £) £ Cy and let £p be the partition of E which puts a(3,j5 € E 
in the same set whenever they connect the same vertex color classes. Then Q has a supremum 
in By, given by 

s B {G) = (V,£A£ B ) 
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V 



Figure 9: B^ is not stable under V. 



Proof: The claim is trivially true for Q G I?y. If ^ € Cy \By, an edge regular colouring cannot 
be achieved through splitting vertex colour classes or adding edge colour classes. The only 
effective manipulation is therefore the splitting of edge colour classes. The coarsest partition 
which is finer than £ and gives an edge regular colouring is £ A £b, as it splits edge colour 
classes only if they connect different vertex colour classes. □ 

We deduce: 

Theorem 5.3. Sg is a complete non- distributive lattice with respect to model inclusion. The 
meet operation is induced by the meet operation in (Cy; <) given in equation (9). The join of 
two models represented by graphs G,T~L € By is represented by the graph sb{Q V"H). 

Proof: Proposition 5.1 implies that inf H exists for all finite H C By, which by Lemma 2.1 
gives that By is a complete lattice, with the same meet operation as {Cy; ■<). As the zero and 
unit in {Cy, ^} have edge regular colourings, they are also the zero and unit in By. 

By definition, for G,H € By, the join /C of Q and Tl in By is the smallest graph with 
respect to partial ordering -< which satisfies /C >z G , K- ^ H and /C € By. The supremum 
G V % of Q and T-L in (Cy; X) is the smallest graph satisfying the first two relations so that K, 
is in fact the smallest graph satisfying JC >z (Q V"H) and K, € By. Thus Q Vs % is given by the 
supremum of Q V H in By, which by Proposition 5.2 equals sb{G V H). 

Non-distributivity of By is established by observing that equation (4) is violated for a = Qq, 
b = G 7 and c = G s displayed in Figure 10, as Qq = G§V (Q7 AGs) ^ (<?6 VQr) ^(GgVGs) = GqVGt- 





1 




Qi AGs 



1 



Figure 10: Non-distributivity of By. 



* 



Q 6 v g 7 = g u v g 8 




The results on the structure of By naturally translate to the set of models <S^ , proving 



the claim. 



□ 



5.2 Models Represented by Permutation-Generated Colourings 

Let Ty denote the set of permutation groups acting on V. Then (Fy;C) is a complete 
lattice Schmidt (1994) with meet and join operations given by T± AT2 = Ti (IT2 and T± VT2 = 
(r x UT 2 ). We obtain: 
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Proposition 5.4. fly is stable under the meet operation A in (Cy; ^) given in equation (9). If 
Q = (Vg, £g),"K = (Vui£h) S Ily are generated by Tg,T-^ € Ty, then the colouring of Q A% 
is generated by Tg V Ty_. 

Proof: Let Q,H £ Ily and Tg,T-}{ £ Ty be as in the claim. Then Vg and Eg are unions of 
orbits of Tg in V and Eg, and similarly for V%, and r%. By definition of the meet operation 
in (Cy, each vertex colour class in Q A % = (Vg/\u, £gKu) can be expressed as a union of 
vertex colour classes in Vg, and as a union of vertex colour classes in V-u, and similarly for the 
edges. Thus the colouring of Q A H is invariant under the action of both groups Tg and T^, 
and therefore also under Tg V r% . 

To show that the colouring of Q AT~L is generated by Tg vT%, we need to show that whenever 
two vertices or edges in Q A H are of the same colour, then there exists a € Tg V r% which 
maps one of them to the other. We present the argument for the edges only, as it is can be 
trivially transferred to the vertices. So let a/3 and 7<5 be two edges in Q A % of equal colour. 
Then there exists a sequence aoA)> ■ ■ ■ ,«fc/3fc in Eg^ such that ao/3o = <*/9, ctkftk = 7$, and 
aj_i/5j_i = ctif3i (£g) or aj_i/3j_i = aj/3j for 1 < i < k. There must therefore exist 

ai € TgUTu such that aj_i/3j_i is mapped to aijii by cij for 1 < i < k, giving that the product 
CTfc . . . 0"i £ Tg V r% maps a/3 to 7^. □ 

Graphs Q^, Q§ and Q4 A £5 in Figure 4 illustrate the above result, as each of the graphs lies 
in U v , with generating groups T 4 = ((13)(24)}, T 5 = ((13)) and r 4A5 = T 4 V T 5 = ((13), (24)} 
respectively. Observing that the join ft V ^5, also displayed in Figure 4, does not lie in Ily 
establishes that Tly is generally not stable under the join operation in (Cy, ^). However the 
following holds: 

Proposition 5.5. For Q = (V,£) € Cy, let Aut(V, £) < S(V) denote the largest group leaving 
(V, £) invariant and let (VAut^Aut) denote the graph colouring of G = (V,E) given by the 
orbits of Aut(V,£) in V and E respectively. Then Q has a supremum in Tly given by 

su(Q) = (VAut,^Aut) 

Proof: The claim is trivially true if Q S Tly. So suppose Q € Cy \ Tly. Q is modified to a 
larger graph by adding edge colour classes and splitting colour classes. As the former will not 
enforce a permutation-generated colouring, to prove the claim we need to that (VAutj^Aut) is 
the coarsest refinement of (V,£) which lies in Tly. This is clearly the case as Aut(V,£) is the 
largest group which leaves V and £ invariant. □ 

Theorem 5.6. <5j£ is a complete non- distributive lattice with respect to model inclusion. The 
meet operation is induced by the meet operation in (Cy;<) given in equation (9). The join of 
two models represented by graphs Q,H € Ily is represented by the graph sn{Q V T~L). 

Proof: The proof is analogous to the proof of Theorem 5.3. In brief, Proposition 5.4 and 
Lemma 2.1 give that Ily is a complete lattice with meet operation as claimed. As the zero and 
unit in (Cy; ■<) have permutation-generated colourings, with Tq = S(V) and T\ = (Id), they 
are the zero and unit in Ey. 

By Proposition 5.5, the join of two graphs G,rl£ Tly is given by sji(Q V IV). The graphs 
displayed in Figure 10 establish non-distributivity of Ily as each of them has a permutation- 
generated colouring, with T 6 = ((124)), T 7 = ((234)), T 8 = ((13), (24)}, r 7A8 = 5([4]) and 
L6V7 = L6V8 = ((24)} respectively. The above directly translate to <S^ , proving the claim. □ 
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5.3 Models Represented by Regular and Vertex Regular Colourings 

The structures of Ry and Py turn out to be closely related and it is for that reason that we 
treat them together. We abstain from giving explicit proofs of intermediate results for brevity, 
however give all facts the reader will require to construct them. We shall employ the notion 
of a factor graph: 

Definition 5.7 (Frey (1998)). Let f(Y) be a function in Y which factorises as f(Y) = 
IUeifAi(YAi) where A4 C V and cannot be factorised further for i € I Then the factor 
graph of / is the graph Gf = (V U F,Ep) with F = {fA^iei being the set of factor vertices 
and E F = {afAi \ a G V, f Ai G F with a G Ai}. 

For Y = (Y a ) a£ y assumed to follow a J\f\y\((j,,Yl) distribution, the density f(y) factorises 

as 

f(y) II exp{-fc aa (y Q - u a ) 2 /2} • exp{-fc Q/3 (y a - fi a )(yp - Hp)} 

aev a,pev, 

giving that for the Gaussian distribution, either A4 = {a} or A, = {a, /?} for a, f3 G V, with 
a factor being present if and only if the corresponding entry in X -1 is non-zero. Thus if 
the distribution of Y is assumed lie in the graphical Gaussian model represented by graph 
G = (V,E), by equation (3), each factor corresponds to a vertex in V or edge in E. The 
vertices in V can clearly be identified with their factors so that the factor graph of a graphical 
Gaussian model with graph G = (V, E) equals 

Gf = (VUF, Ef) with F = {e \ e G E} and Ep = {ae \ a G V, e G E is incident with a in G}. 

This can be extended to the notion of a coloured factor graph, with an example given in Figure 
11. 

34 




12 

(a)0 = (V,£) (b) Qf = (VuAf,E F ) 



Figure 11: A coloured graph and the corresponding coloured factor graph. 

Definition 5.8. For Q = (V, £ ) G Cy representing a graphical Gaussian model with equality 
constraints, let N be a set of nodes with each node representing an edge in E and let Af be 
the colouring of iV in which nodes receive the same colour if and only if the corresponding 
edges are equally coloured in £. The coloured factor graph of the model is defined to be 
the vertex and node coloured graph Qp = (V U M, Ep) with Ep = {an \ a G V, n G 
./V and n represents an edge incident with a}. The set of coloured factor graphs with vertex 
set V is denoted Ty. 
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We give four intermediate results whose proofs we omit for brevity. 

Lemma 5.9. Ty and Cy are isomorphic. We denote the isomorphism by (py : Cy — > Ty . 

Lemma 5.10. If Q = (V, £ ) € Cy and 4>v{Q) = Qf = (VL)J\f,Ep) is the corresponding factor 
graph, then (V, £) € Ry if and only if (V U J\f) is equitable with respect to Gp = (V U N,Ep). 

Lemma 5.11 (McKay (1976)). If Pi and P2 are two partitions ofV both equitable with respect 
to the same graph G = (V,E), then so is their join P\ V P2. 

Lemma 5.12 (McKay (1976)). For G = (V,E) and a partition P of V , (up to the order of 
cells) there exists a unique coarsest partition that is finer than P and equitable with respect to 
G, to be denoted by rc(P). 

Combined, Lemmas 5.9, 5.10 and 5.11 can be used to prove: 

Proposition 5.13. Ry is stable under the meet operation A in {Cy; ■<) given in equation (9). 

Note that the graphs in Figure 4 illustrate the stability of Ry under A while showing that 
Ry is generally not stable under V in {Cy; ■<). Further, Lemma 5.12 implies: 

Proposition 5.14. Let Q = (V, £) G Cy and let (f)y(G) = Qf = (V UAf,E F ) be the corre- 
sponding coloured factor graph. Then Q has a supremum in Ry given by 

S R(g) = <PyH(r GF (VuM),E F )) 

We conclude: 

Theorem 5.15. is a complete non- distributive lattice with respect to model inclusion. 

The meet operation is induced by the meet operation in {Cy; <) given in equation (9). The join 
of two models represented by graphs G,7i £ Ry is represented by the graph sr(G V H). 

Proof: The proof is analogous to the proofs of Theorems 5.3 and 5.6. □ 

Lemma 5.12 further implies: 

Proposition 5.16. Py is stable under the meet operation A in {Cy; ^) given in equation (9). 

The graphs in Figure 4 also illustrate the stability of Py under A while establishing that 
Py is generally not stable under V in {Cy; ^). Further: 

Lemma 5.17. If Q = (V,£) € Cy and s R (G) = (Vr,£r), then for all Q' = (V',£) G Cy with 
V R <V <V, s R (g') = (V R ,£ R ). 

Lemma 5.17 can be shown to imply: 

Proposition 5.18. For Q = (V, £ ) £ Cy let s R {Q) = (V R ,£ R ). Then Q has a supremum in 
Py given by 

sp(Q) = (V R ,£) 

We conclude: 

Theorem 5.19. Sp v and TZp v are complete lattices with respect to model inclusion. Their 
meet operation is induced by the meet operation in {Cy;<) given in equation (9). The join of 
two models represented by Q,H G Py is represented by the graph sp{Q VH). 

Proof: In complete analogy to the proofs of Theorems 5.3 and 5.6. □ 
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6 Model Selection 



One way to develop model selection procedures for the model classes considered in this article 
is by adapting existing model search algorithms for unconstrained graphical Gaussian models. 
Having shown each of the model classes to be complete lattices, just as the set of standard 
graphical models, it is natural to consider methods which exploit this structural property. 
Prominent methods among them are stepwise procedures (Whittaker, 1990; Edwards, 2000), 
the Edwards-Havranek model selection procedure (Edwards and Havranek, 1987), and, more 
recently, neighbourhood selection with the lasso (Meinshausen and Biihlmann, 2006), stability 
selection (Meinshausen and Biihlmann, 2010), and the SINful approach (Drton and Perlman, 
2008). 

A crucial difference between the search spaces of unconstrained graphical Gaussian models 
and the models studied here is that while the former constitute a distributive lattice, the latter 
are all non-distributive. This directly disqualifies neighbourhood selection with the lasso, 
stability selection and the SINful approach, as they all require distributivity. Put explicitly, 
while the just mentioned methods are algorithms for determining for each edge whether it is 
to be present in graph of the accepted model(s) or not, for the model classes considered here 
not only the edge set needs to be determined, but also partitions of the vertices and present 
edges into sets corresponding to equal model parameters. This turns model selection into a 
principally different problem. 

For the rest of the article we focus on the Edwards-Havranek model selection procedure 
and develop a corresponding algorithm for the lattice of models Sg v represented by edge 
regular colourings. We illustrate the algorithm with a brief summary of its, very encouraging, 
performance for the data set described in Example 3.1. We further give a summary of an 
Edwards-Havranek model search within the lattice of models with permutation-generated 
colourings for the Fret's heads data described in Example 4.7. All formal results are given 
without proof, however they can be obtained by considering the partial ordering of Cy . 

6.1 The Edwards-Havranek Model Selection Procedure 

The Edwards-Havranek model selection procedure operates on model search spaces which are 
lattices and is closely related to the all possible models approach but considerably faster. It is 
based on the following two principles: (i) if a model is accepted then all models that include it 
are (weakly) accepted, and (ii) if a model is rejected then all of its submodels are considered 
to be (weakly) rejected. 

The procedure starts by initially testing a set of models and assigns the accepted models to 
a set A and the rejected models to set 1Z. By assumption, all models larger than A are (weakly) 
accepted and the ones smaller than 1Z are (weakly) rejected, so that only min^4, the smallest 
models in A, and max 1Z, the largest models in 1Z, are of interest. The procedure repeatedly 
updates min.4 and max 7Z and terminates once the set to be updated remains unchanged, 
when it returns min„4. The method to determine whether a model is to be rejected can be 
any suitable statistical test in accordance to the principle of coherence (Gabriel, 1969), stating 
that a test should not accept a model while rejecting a larger one. 

Following Edwards and Havranek (1987), for a set of models S let D a (S) denote the set 
of models in the search lattice L say which are smallest with the property that they are not 
contained in any model in S, 

D a (S) = mm{d G L \ d % s for all s e S} 
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and let D r (S) be the set of largest models that do not contain any model in S, 



D r (S) = max{d G L \ s <2 d for all s G S} 



D a (S) is referred to as the acceptance dual of S, and D r (S) as the rejection dual of S. The 
procedure may then be summarised as: 

1. Test an initial set of models and assign the accepted models to A and the rejected models 
to K. 

2. Choose between 3 and 4. 

3. Test the models in D r (A) \ 1Z. If all are rejected, stop; otherwise, update A and 1Z and 



4. Test the models in D a (1Z) \ A. If all are accepted, stop; otherwise, update A and 1Z and 



Acceptance and rejection duals of sets of models can be computed in a recursive manner 
by using the following two relations. If S and T are two sets of models, then 



Thus describing the duals of a single model is enough. 

6.2 Models Represented by Edge Regular Colourings 

Proposition 6.1. Let S + (V,£) G Sg be a model represented by graph Q = (V,£) with 
edge regular colouring and underlying uncoloured graph G = (V, E). Then the acceptance 
dual D a (S + (y,£)) of S + (V,£) in contains all models represented by coloured graphs 

Qa = (V a ,£ a ) satisfying 

(li) V a = {Vi, V 2 } such that V a ^V and £ a = 
(Hi) V a = {V} and £ a = {E a } with £ a ^ and £ a ^ £ . 

Put into words, models in the acceptance dual of S + (V, £) € S^ v are either represented 
by the empty graph with two vertex colour classes which are not unions of colour classes in 
V, or they are represented by graphs in which all vertices are of the same colour, as are the 
edges, and the edge set is not a union of colour classes in £. For example, the coloured graphs 
in Figure 12 display models which lie in the acceptance dual of the model represented by the 
edge regular colouring in Figure 5(a). 

By definition, acceptance duals are used to test whether there exist models immediately 
larger than max 1Z which can be rejected. Effectively, graphs of type (li) refine the colouring 
of the maximally rejected models, while graphs of type (lii) add edge colour classes, which 
both give larger models. 

Proposition 6.2. Let S + (V,£) € be a model represented by graph Q = (V, £) with edge 
regular colouring and underlying uncoloured graph G = (V,E), and for a discrete set A, let 
atom(^4) denote the partition of A into atomic sets. Then the rejection dual D r (S + (V,£)) of 
<S + (V, £) in S% contains all models represented by coloured graphs Q r = (V r ,£ r ) satisfying 



go to 2. 



go to 2. 



D a (S U T) 
D r (SUT) 



mm{s Vt | s G D a (S),t G D a (T)} 
max{s At | s £ D r (S),t G D r (T)} 
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1 • • 2 

(a) A graph of type (li). 



1 * — I * 2 

* * 
(b) A graph of type (lii). 



Figure 12: Acceptance dual corresponding to the graph in Figure 5(a). 



(2i) V r = {{a,f3}} U atom(y \ {a, 13}) such that V r ^V and £ r = {a/3 \ a, (3 G V}, or 

(2ii) V T = atom(y) and £ r = atom({a/3 | a,/3 G V} \ {e}) with e G E, or 

(2iii) V r = {{a, {7, 5}} U atom(y \ {a, f3, 7, 5}) with a, (3 andj,5 being of the same colour 
in V and £ r = {{a~/,(35}} U atom({a/3 | a, (3 G V} \ {07, (35}), where we may have a = 
(3 or 7 = 5 but not both, such that (V,£) ^ (V r ,£ r )- 

Graphs representing models in the rejection dual of a model represented by Q = (V, £) 
almost represent the unrestricted saturated model except for a minor modification: In graphs 
of type (2i) two vertices which are not of the same colour in V form the only composite colour 
class in V r , while graphs of type (2ii) are missing an edge present in Q = (V,£). Graphs of 
type (2iii) have a pair of equally coloured edges which are not of the same colour in £, and 
give the end vertices of the edges an appropriate colouring for the graph colouring to be edge 
regular. Examples of coloured graphs representing models in the rejection dual of the model 
represented by the graph displayed in Figure 5(a) are given in Figure 13. 






(a) A graph of type (2i) 



(b) A graph of type (2ii) 



(c) A graph of type (2iii) 



Figure 13: Rejection dual corresponding to the graph in Figure 5(a). 

Rejection duals contain models which lie immediately below min.4,. Graphs of type (2i) 
merge vertex colour classes, graphs of type (2ii) cause edge colour classes to be dropped and 
graphs of type (2iii) merge edge colour classes, as well as vertex colour classes to ensure the 
resulting model to be edge regular. All operations give graphs which represent smaller models. 

It can be shown that while \D a (S + (V , £))\ grows super-exponentially in the number of vari- 
ables |V| (at rate O^l^ 2 / 2 )), the size of the rejection dual \D r (S+(V,£))\ grows poly normally 
in \V\ (at rate 0(|y| 4 )), so that from a computational point of view working with rejection 
duals only is much more efficient. The following algorithm is therefore most efficient for models 
with edge regular colourings. 

1. Test an initial set of models and assign each to 1Z if it is rejected and to A otherwise. 

2. Test the models in D r (A) \ 1Z. If all are rejected, stop. Otherwise update A and 1Z and 
repeat. 
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We executed the above algorithm for the Mathematics marks data set described in Example 
3.1 with the saturated uncoloured model as our initial set of accepted models A and let 1Z = 
initially. Models were tested for acceptance by performing a likelihood ratio test relative to 
the saturated unconstrained model at significance level 5% using functionality implemented in 
the R package gRc rfojsgaard and Lauritzen (2007). The algorithm fitted 232 models, out of a 
total of 1.3 • 10 6 , in 8 stages before arriving at 4 minimally accepted models whose graphs are 
displayed in Figure 14 together with their BIC values. (The 232 models are distributed among 
the stages as follows. 1: 20 (6 accepted), 2: 21 (19 accepted), 3: 41 (40 accepted), 4: 56 (56 
accepted), 5: 55 (55 accepted), 6: 29 (29 accepted), 7: 9 (9 accepted) and 8: 1 (1 accepted).) 




Mechanics BIC 2601.617 Statistics Mechanics BIC 2600.017 Statistics 



Vectors Q3 Analysis Vectors Q4 Analysis 




Mechanics BIC 2603.376 Statistics Mechanics BIC 2591.468 Statistics 



Figure 14: Graphs of minimally accepted models in Sg . 

The uncoloured graphs underlying the graphs in Figure 14 contain the graph in Figure 1(a) 
as a subgraph. Q\, Q2 and Gs only differ by one edge and Q4 has exactly the same edge set. 
Thus the conditional independence structures largely agree. The minimally accepted model 
with the lowest BIC value is represented by Q±. It is different from but not dissimilar to the 
RCON model fitted in Efojsgaard and Lauritzen (2008), whose graph is displayed in Figure 
1(b), which has a slightly lower BIC value of 2587.404 but no specific properties and it is 
chosen in an ad hoc manner. Note that the model fitted in tfojsgaard and Lauritzen (2008) is 
not edge regular so that it could not have been considered by the algorithm. 

This example suggests that an Edwards-Havranek model selection procedure for models 
with edge regular colourings may be feasible in general. 

6.3 Models Represented by Permutation-Generated Colourings 

The class of permutation-generated colourings Ily is more complex in its structure than By 
and therefore the duals D a (S + (V, £)) and D r (S + (V, £)) of a model <S + (V, £) cannot be given in 
a purely combinatorial form in the graph colouring. A sound understanding of the relationship 
between a group Y and its orbits in V and V x V is required in order to design a general 
algorithm, in principle, applicable to any variable set V. For illustrative purposes of the 
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underlying principles we provide a brief summary of an Edwards-Havranek model search in 
Sj[ for Fret's heads data described in Example 4.7. 

For V = {Bi, B2, L\, L2} the symmetric group S(V) contains 4! = 24 permutations and 
has 30 subgroups, 17 of which are generated by a single permutation. Let JCy denote the set 
of colourings of the complete graph these groups generate. The graph colourings in JCy which 
are generated by a single permutation, i.e. by T = {a} for a € S(V), are displayed in Figure 
15, where, for the sake of legibility we label the vertices by V = {1, 2, 3, 4}. The remaining 13 
subgroups generate only 5 distinct colourings in /C[4] , all of which are shown in Figure 16 with 
one of their generating groups. 





* * 
r 8 = ((i23)> 



2 1 





G5 ± ± Qe, 

* 3 4 * 



* * 
r 9 = {(i24)> 



2 1 




Tio = ((134)) 



2 1 




2 1 





* * 
T 12 = ((12)(34)> 



r 13 = <(i3)(24)) r 14 = ((i4)(23)> r 15 = ((1234)} r 16 = ((1243)) r 17 = {(1324)) 



Figure 15: Colourings in K,^ which are generated by V = (a) for some a £ S(V). 



4 . «3 4 m 1 m 3 4 a 1 a 3 4 




1 




1 




031 




3 4 



T 18 = ((12), (34)) T 19 = ((13), (24)) r 20 = <(14), (23)) 



1 



* * 

r 2 i = 

((12)(34),(14)(23)) 




r 22 = 5([4]) 



Figure 16: Remaining colourings in Km- 



The search space II [41 consists of all models which are represented by one of the 22 graphs 
in /C[ 4 ] = {Gi, . . . ,^22} displayed in Figures 15 and 16, together with those represented by 
graphs which can be obtained from the above by dropping edge colour classes. Thus the size 
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of the total search space is 



|n [4] | = Nx + 6N 2 + 4iV 8 + 3iVi 2 + 3{N 15 - 1) + 3(iVi 8 - 4) + (N 2 i - 4) + N 22 
= 2 6 + 6 • 2 4 + 4 • 2 2 + 3 • 2 4 + 3(2 2 - 1) + 3(2 3 - 4) + (2 3 - 4) + 2 = 251 

where 2Vj denotes the number of graphs one can obtain from graph Qi. The subtracted 
correction terms prevent some graphs to be counted more than once. 

Figure 17 displays the Hasse diagram of ICw . By construction, it contains only graphs which 
represent the saturated model. An Edwards-Havranek model selection procedure searches 
along the full Hasse diagram of all models in II [41, which contains all 251 models and has the 
diagram in Figure 17 as a subgraph. At each stage, the search moves along the edges in the 
diagram, passing each model at most once. Once a model has been rejected, all models below 
it are excluded from the future search; once a model has been accepted all models above it are 
excluded. 



Qi 




Q22 



Figure 17: Hasse diagram of /G41. 

Exploiting the demonstrated lattice structure of II^i, we applied the algorithm to the Fret's 
heads data described in Example 4.7 with A initially consisting of the saturated unrestricted 
model and 1Z = 0. After testing 48 models in 4 stages the algorithm arrived at 9 minimally 
accepted models whose graphs and generating groups are given in Figure 18. (The models are 
distributed between the stages the following way. 1: 15 (9 accepted), 2: 16 (16 accepted), 3: 
13 (13 accepted), 4: 4 (3 accepted).) 

The minimally accepted model with the lowest BIC value is represented by graph 
Tig, which is considerably less than the BIC value 471.2982 of the model fitted 
in H0jsgaard and Lauritzen (2008) whose graph is displayed in Figure 7(b). Further, the 
model selected in H0jsgaard and Lauritzen (2008) is a supermodel, in fact the supremum, 
of the models represented by H7 and T~Ls, with a further edge to complete the four cycle. 
Interestingly, H 2 and H3 are two of the graphs found in Section 8.3 in Whittaker (1990), the 
other two being the underlying uncoloured graphs of %j and Hs- Note that the BIC value 
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B 2 B 




• B 2 Si * 



BIC 458.6692 



r 2 = (id) 

BIC 471.1172 




r 6 = ((B1L2)) 

BIC 460.7432 



((SiS 2 7 )(iiL 2 )> 
BIC 458.9100 




B 2 Si 



r 3 = (id) 

BIC 470.4083 



J):* ^8 jj--)- 



• i2 il 



B2 B\ 



((BjBaXLiLa)) 
BIC 459.2543 




r 4 = ((fiiLi)) 

BIC 465.8199 



He* ^9 >|c* 



L2 -Li 



B 2 B 




((BiBa), (LiLa)> 
BIC 451.3409 



r 8 = ((BiLi)) 
BIC 466.0704 



Figure 18: Graphs of minimally accepted models in IImi for Frets' heads data. 



of the complete symmetry model, whose graph is displayed on the right in Figure 7(b), lies 
between the smallest and the second smallest BIC values of the minimally accepted models, 
the corresponding graphs being Tig and Hi respectively. However it was (weakly) rejected by 
the procedure as it is a submodel of a model rejected in stage 1. 

7 Discussion 

As we argued, graphical Gaussian models with equality constraints are a promising model class 
as they combine parsimony in the number of parameters with the concise and efficient graphical 
models framework. We studied two model types introduced by Hojsgaard and Lauritzen 
(2008): RCON models which place equality restrictions on the model covariance matrix 
and RCOR models which restrict the diagonal of S" 1 and the partial correlations, which can 
both be represented by vertex and edge coloured graphs. 

We showed four model classes within the sets of RCON and RCOR models, each possessing 
desirable statistical properties and being more readily interpretable than RCON and RCOR 
models in general, to form complete non-distributive lattices. This qualifies each of them for 
an Edwards-Havranek model selection procedure. Two model classes, those represented by 
edge regular and permutation-generated colourings respectively, are most readily interpretable 
and possess the most tractable structure out of the four and are thus most suitable for a model 
search. 

For the former model class we have developed an Edwards-Havranek model selection 
algorithm, and demonstrated an encouraging performance for the data set previously described 
in Example 3.1. We further illustrated the principal functionality of the Edwards-Havranek 
procedure on the lattice of models represented by permutation-generated colourings with the 
example of Fret's heads data from Example 4.7. Here as well, the algorithm performed in a 
satisfactory fashion. In order to fully generalise it to work for any number of variables |V|, 
further investigation into the relationship between permutation groups acting on V and their 
orbits in V and V x V is necessary. 

Some potential concerns need to be mentioned: firstly, while the algorithm's performance 
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in the above examples was encouraging, it has to be taken into account that the number 
of variables is rather small in both cases. Further, it is at this stage unknown how much 
this behaviour relied on strong/weak conditional independence and symmetry relations in the 
data sets considered. It may be that the number of models to be tested can still grow in 
an unmanageable fashion. Secondly, a general concern with the Edwards-Havranek model 
selection procedure is that its sampling properties are intractable. In particular the procedure 
does not control the overall error rate. 

Especially in view of the above, it may be worthwhile to explore alternative 
model selection approaches. We argued that neighbourhood selection with the lasso 
(Meinshausen and Biihlmann, 2006), stability selection (Meinshausen and Btihlmann, 2010), 
and the SINful approach (Drton and Perlman, 2008) were not directly applicable to the 
lattices of models studied in this article due to their non-distributivity. Modified variants 
may still be feasible, which could be investigated. 

One further viable alternative may be a symmetry variant of the graphical lasso 
(Friedman et al., 2008; Ravikumar et al., 2008), which in its original form seeks to maximise 
the penalised log-likelihood 

logdetS" 1 -tr^E" 1 ) - pW^- 1 ^ (12) 

over non-negative definite matrices X -1 , with S denoting the empirical covariance matrix of 
the observations, tr(-) being the trace, || • ||i the l\ norm giving the sum of the absolute values 
of the elements in the argument matrix and p being the penalisation parameter. Testing for 
equality constraints on the entries of X" 1 = (k a p)a,peV can be enforced by replacing equation 
(12) by the following function 

logdetS" 1 - tr(S'S~ 1 ) - - p 2 ^ \k aa -kpp\-p3 ^ \k a p - k yS \ 

a±p n +h 

This lies in direct analogy to the development of the fused lasso (Tibshirani et al., 2005) from 
the standard lasso for linear regression. However how to maximise the function for a given 
model class, for example over models with edge regular colourings to ensure scale invariance, 
seems non-trivial. 
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